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Abstract — The use of multi-antenna arrays in both transmis- 
sion and reception has been shown to dramatically increase 
the throughput of wireless communication systems. As a result 
there has been considerable interest in characterizing the er- 
godic average of the mutual information for realistic correlated 
channels. Here, an approach is presented that provides analytic 
expressions not only for the average, but also the higher cumulant 
moments of the distribution of the mutual information for zero- 
mean Gaussian MIMO channels with the most general multipath 
covariance matrices when the channel is known at the receiver. 
These channels include multi-tap delay paths, as well as general 
channels with covariance matrices that cannot be written as 
a Kronecker product, such as dual-polarized antenna arrays 
with general correlations at both transmitter and receiver ends. 
The mathematical methods are formally valid for large antenna 
numbers, in which limit it is shown that all higher cumulant 
moments of the distribution, other than the first two scale to 
zero. Thus, it is confirmed that the distribution of the mutual 
information tends to a Gaussian, which enables one to calculate 
the outage capacity. These results are quite accurate even in the 
case of a few antennas, which makes this approach applicable to 
realistic situations. 

Index Terms — Wideband; Multipath; Beamforming; Capacity; 
Multiple Antennas; Random Matrix Theory; Replicas; Side 
Information 



I. Introduction 

FOLLOWING pioneering work by [1], [2] it has become 
clear that the use of multi-antenna arrays in transmission 
and reception can lead to significantly increased bit-rates. 
This has led to a flurry of work calculating the narrowband 
ergodic mutual information of such systems, i.e. the mutual 
information averaged over realizations of the channel, using 
a variety of channel models and analytic techniques. For 
example, the ergodic capacity was calculated asymptotically 
for a large number of antennas, [3]-[12] or for large and small 
[10], [13] signal-to-noise ratios, using a variety of assumptions 
for the statistics [10], [14] of the fading channel. 

To better understand the characteristics of realistic informa- 
tion transmission through fading channels, it is important to 
analyze the full distribution of the mutual information over 
realizations of fading. For example, the outage capacity [15] 
is sometimes a more realistic measure of capacity for delay 
constrained fading channels. In addition, the distribution of the 
mutual information provides information about the available 
diversity in the system [16]: the smaller the variance, the lower 
the probability of outage error when transmitting at a fixed 
rate. Finally, having an analytic expression for the distribution 
of the mutual information allows one to simulate a system 
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of multiple users in a simple way. [17] Recently, [8], [14] 
analytically calculated the first few moments of the distribution 
of the narrowband mutual information, asymptotically for 
large antenna numbers with spatial correlations. This analysis 
showed that the distribution is approximately Gaussian even 
for a few antennas, also seen in [17], [18]. More recently, other 
methods were devised to calculate all moments of the mutual 
information distribution exactly for some channel types. [19]- 
[24] Also, [10] calculated the ergodic mutual information 
in the large antenna limit for independent non-identically 
distributed (IND) channels, and extended their results to 
correlated channels with special restrictions on the correlations 
of different paths. 

The above literature did not analyze the statistics of 
the mutual information for Gaussian channels with general 
non-Kronecker-product correlations. [25]-[28] These types of 
channels are becoming increasingly important to study, as 
it has recently been proposed that they appear in several 
situations, such as channels for generally correlated antennas 
with multiple polarizations. [27], [28] 

Furthermore, the above works have generally focused on 
the case of narrowband flat-fading channels. However, the 
use of wide-band signals with non-trivial resolvable multipath 
necessitates the analysis of the mutual information in the 
presence of multipath. [29], [30] showed that the capacity of 
the wideband channel depends only on narrowband quantities, 
such as total average power etc. Subsequently, other authors 
have analyzed the wideband ergodic capacity using asymptotic 
methods. [7], [31] In a first attempt to describe the wideband 
distribution of the mutual information, [32] suggested that 
the distribution is Gaussian, if the number of independent 
paths is large. However, in many instances of interest the 
number of paths seen is small. [27], [33] It would thus be 
useful to analyze the effects of multi-path on the wideband 
mutual information of Gaussian MIMO fading channels of 
arbitrary multipath behavior in an analytic fashion. Although 
the exact methods mentioned above [19]-[24] can calculate 
all moments of the distribution for narrowband channels, they 
cannot be generalized to multi-path channels. Therefore, to 
make progress, one needs to rely on asymptotic methods. 

In this paper we extend work done in [14] to provide 
analytic expressions for the statistics of the mutual information 
in the presence of multi-path with general spatially correlated 
channels. We assume that the instantaneous fading channel 
is known to the receiver but not the transmitter. Our results 
generalize the mutual information results of [10] for Gaussian 
channels to arbitrary zero-mean Gaussian correlated channels. 
The paths may or may not have the same delay. The methods 
used here apply the concept of replicas, which was initially 
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introduced in statistical physics for understanding random 
systems [34], but in recent years have seen several applications 
in information theory. [4], [7], [35]-[37]. 

In particular, we obtain the following results: 

• We use the replica method to calculate the moment 
generating function of the mutual information, averaging 
over general multipath, non-Kronecker product channels. 
Using this approach we derive expressions for its first 
three moments (mean, variance and skewness). As in 
[14] we find that for large antenna numbers n, the 
average of the distribution is of order n, the second 
moment of the distribution is of order unity, and the 
third moment is order 1/n respectively, while all other 
moments scale with higher powers of 1/n. Thus, for 
large n the mutual information distribution approaches a 
Gaussian. Therefore, the outage mutual information can 
be expressed simply in terms of the mean and the variance 
of the distribution (section |IVJ. 

• We optimize the mean mutual information with respect to 
the input signal distribution to obtain the ergodic capacity 
(section IIV-B> . 

• We demonstrate the dependence of the whole distribution 
of the mutual information on the specifics of the channel 
by calculating the mean and variance of the mutual 
information for a number of simple multipath channels. 

• We also compare these Gaussian distributions with nu- 
merically generated ones and find very good agreement, 
even for a few antennas. This validates the analytical 
approach presented here for use in realistic situations with 
small antenna numbers. 



A. Outline 

In the remainder of this section we define relevant notation. 
In section |II| we describe the MIMO channels for which our 
method is applicable, in both the temporal and the frequency 
domain. In section |lll| we define the wideband mutual infor- 
mation and in section IIII-AI the statistics of its distribution. 
Subsequently, in section II VI the mathematical framework of 
the method to calculate the generating function of the mutual 
information is presented. Also, the calculation of the ergodic 
capacity (section IIV-B> . its variance (section lIV-CI l and the 
higher order moments of the distribution (section IIV-D> are 
discussed. Section HV-EI deals with a alternative derivation of 
the results for the case when the receive correlation matrix is 
the same for all paths, while section llV-Fl brieflv discusses the 
case of narrowband multipath, where all paths arrive at the 
same delay tap. In section a few specific cases are analyzed 
analytically and compared to numerical Monte-Carlo calcula- 
tions. Appendix m summarizes a number of complex integral 
identities employed in the main section, while Appendices [II] 
and Hn] contain some details for various steps in section IIVI 
Appendix II VI includes some guiding details of the calculation 
of the higher order terms in section IIV-DI Finally, Appendix 
|y]describes the procedure of evaluating the capacity-achieving 
transmission covariance Q. 



B. Notation 

1) Vectors/Matrices: Throughout this paper, we will use 
bold-faced upper-case letters to denote matrices, e.g. X, with 
elements given by Xab, bold-faced lower-case letters for 
column vectors, e.g. x with elements Xa, and non-bold letters 
for scalar quantities. Also the superscripts T and f will indicate 
transpose and Hermitian conjugate operations and I„ will 
represent the n-dimensional identity matrix. 

Finally, the superscripts/subscripts t and r will be used for 
quantities referring to the transmitter and receiver, respectively. 

2) Gaussian Distributions: The real Gaussian distribution 
with zero-mean and unit-variance will be denoted by JV{0, 1), 
while the corresponding complex, circularly symmetric Gaus- 
sian distribution will be CAf(0, 1). 

3) Order of Number of Antennas 0(n'^): We will be exam- 
ining quantities in the limit when the number of transmitters 
rit and number of receivers n^, are both large but their ratios 
are fixed and finite. We will denote collectively the order in an 
expansion over the antenna numbers as 0{n), 0(1), 0{l/n) 
etc., irrespective of whether the particular term involves rit or 
rir- 

4) Integral Measures: Two general types of integrals over 
matrix elements will be dealt with and the following notation 
for their corresponding integration measures will be adopted. 
In the first type we will be integrating over the real and 
imaginary part of the elements of a complex rrirows x 'mcou 
matrix X. The integral measure will be denoted by 



DX = 



n n 



dRe{Xaa)dl-ni{Xaa) 
2tt 



(1) 



The second type of integration is over pairs of complex square 
matrices T and TZ. Each element of T and TZ will be integrated 
over a contour in the complex plane (to be specified). The 
corresponding measure will be described as 



dfiiT,n)^ n n 



2TTi 



(2) 



In addition, we will define a measure over a set of L pairs of 
matrices {T' , 7?.'} for / = 0, . . . , L — 1 to be given simply by 



L-l 



(3) 



1=0 



5) Expectations: We will use the notation ( • ) to indicate 
an expectation over instantiations of the fading channel. We 
will reserve the notation E[-] for expectations over transmitted 
signals. 

II. Multipath MIMO channel model 

We consider the case of single-user transmission from nt 
transmit antennas at a base station to n,. receive antennas at a 
mobile terminal over a fading channel with multiple paths with 
a finite bandwidth. We assume that the channel coefficients 
are known to the receiver, but not to the transmitter The 
transmitted signal can be written in terms of discrete a time 
series representing the signals at discrete time steps mr for 
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meZ and r the inverse available bandwidth. Thus we can use 
the following simple tap-delay model [29], [38] 



L-l 



1=0 



(4) 



where is the rif -dimensional signal vector transmitted at 
time rriT. Similarly, y,„ and z,„ are the corresponding n^- 
dimensional received signal and noise vectors. Zm is assumed 
an i.i.d. vector with each of its elements drawn from CAf{Q, 1), 
while Gi is the rir x nt -dimensional complex channel matrix 
at delay times m;r, where mi is integer- valued. Of course, Gj 
can be interpreted in a wider sense as an appropriately filtered 
version of the channel over the delay interval (m;_iT, m/r]. 
[38] Note that in general all paths need not arrive with different 
delays, i.e. we have rrii^i > mi, with equality when the ^th 
and {I + l)th paths arrive within the same delay interval. In 
fact, all paths may be assumed to arrive over the same delay 
interval. 

The analysis of multipath channels is simplified consid- 
erably by Fourier-transforming the transmitted and received 
signal vectors. In this case the Fourier-transformed received 
signal is solely a function of the corresponding Fourier com- 
ponent of the transmitted signal 



y(Lj) = Q[uj)-k{uj) + z(cj) 



(5) 



where the Fourier transform of the transmitter signal vector 
x(ti;) is defined by 



x(w) 



E 

rn— — OQ 



(6) 



with similar definitions for the Fourier components y(cj), 
z(cj). G(a;) is the Fourier transform of the channel impulse 
response given by 



L-l 



(7) 



Note that (|6|l implies that each symbol vector x(cj) transmitted 
over a single frequency is spread over infinite times. As a 
result, it sees no interference from other frequency components 
due to multi-path. In practice, and in order to avoid mixing 
between close frequencies due to Doppler fading, one has to 
transmit each symbol over a finite time window, therefore 
essentially using a discrete set of frequency components, e.g. 
Wfc = 2iTk/{MT), with k = 0, ... ,M - 1. The number of 
discrete frequency components M is usually chosen so that 
the symbol duration is less than the coherence time of the 
channel tcoh, i-C- A/ < tcoh/T. One can then send different 
symbols one after the other However, there is a residual ISI 
interference due to multipath and the finite Fourier modes are 
no longer orthogonal. Various methods have devised to restore 
orthogonality, such as the inclusion of a cyclic prefix. [39] 
These issues will be ignored here and we will use the discrete 
Fourier mode version of (|5} given by 



y^fe 



Gfcx, 



k^pk ~l~ 



(8) 



where the index p represents the symbol index, k is the 
Fourier mode index with k = 0, . . . , A/ — 1, Gfe is the cor- 
responding channel Fourier component for ujk — 27rfc/(A/r). 
Xpfc (and similarly y-pk, ipk) have been normalized so that 
Zfe, the Fourier transform of the noise vector z„ is i.i.d. 
with elements ^ CAf{0, 1). Also, the input signal in each 
frequency component Xfc is assumed Gaussian with covariance 
E Xj,x^, — Skk'Qk, normalized so that Tr{Qfc} ~ nt- For 
completeness, we rewrite the Fourier transform of the channel 
in as 

L-l 



le 



(9) 



As mentioned earlier, the channel matrices Gfe are assumed 
to be known at the receiver but not the transmitter. 

A. Channel Statistics 

Next, we would like to characterize the statistics of the 
channel matrices G; in Q, which are random due to fading. 
In particular, they are assumed to be zero-mean, independent 
Gaussian random matrices. In addition, we assume the corre- 
lations between elements of G; to be as follows: 

(Gl^iaGl'.jp) = Sw — Ti ij Ri ap (10) 

where the expectation ( • ) is over the fading matrices G;. 
pi, T; and R; are the signal to noise ratio, and the nt- and 
rtr -dimensional correlation matrices for the ^-th path at the 
transmitter and receiver, respectively. Underlying the structure 
of the above correlations is the assumption that different paths 
have uncorrelated channels. [40] Each path is assumed to 
have correlations in the form of a Kronecker product. This 
is certainly valid when each path corresponds to a single 
scattered wave, in which case each of the corresponding cor- 
relation matrices have unit rank. The above channel model is 
in agreement with adopted channel models in third generation 
standards [27]. 

We comment that an interesting special case occurs when 
all the delays m; take the same value, i.e. arrive within 
the same t interval (see section IIV-F> . This represents a 
narrowband channel with non-Kronecker product (or non- 
factorizable) correlations. In other words we could write the 
analogous simple narrowband channel relation 



where 



Yp = Gxp + Zp 
G = ^G, 



(11) 
(12) 



The matrices G/ have correlations of the form above. 
We note that such a form includes general models of polar- 
ization mixing with general correlation matrices between the 
different polarization components [27], [28]. For example, the 
correlations of a multipath channel with antennas of different 
polarizations can be written compactly as 

n T 



nt 



T. 



T 



al3 
Lh 



al3 



1 



.l.v 



l.h 



(13) 
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where the sum is over all paths, pi is the signal-to-noise ratio 
of each path /, T''", T'''' are the correlation matrices of the 
vertical and horizontal polarization components of transmitter 
antennas for the l\h path (and similarly for the receiver arrays) 
and X is the polarization mixing ratio. 

Finally, it should be stressed that the most general nar- 
rowband zero-mean Gaussian model, including the recently 
proposed independent non-identically distributed (IND) chan- 
nel, can be expressed in the form of (I10> . M2\ . since the 
correlations of any Gaussian zero-mean matrix can be written 
as 

{G^^G*p) = ^ Ti^ij Ri^ap (14) 
I 

To see this, let Z = 1, • ■ • , (ntn-r)'^ and then set the matrices 
Ti, Hi have zero entries except for the element ij and a/3, 
respectively, when the index I takes the value l{i,j,a,(3) — 
i + nt{j — 1) + nt'^{a — 1) + nt'^nr{f3 — 1). The non-zero 
values of these matrices can be chosen to be, for example. 



Tl{iJ,a,0)Aj - \GiaG*p 



and R 



1. Although 



this mapping is not unique, it demonstrates the generality of 
our method. 

It should be noted that, since the receiver/mobile terminal 
is usually assumed to be located deep inside the clutter, the 
received signal tends to have wide angle-spread, thereby mak- 
ing the differences in the angles of arrival of different paths 
less distinguishable. Therefore, it is sometimes reasonable to 
assume that the receiver correlations R; are path-independent, 
i.e. 

(15) 



This assumption is not as easily met at the transmitter/base 
station, where the nearest scatterers are typically further sep- 
arated, thereby making the T; typically different. A further 
simplification of the above is the case when the receive 
antennas are uncorrected, which is discussed in [29]. 

As a result of the above, G^, the Fourier transform of G/ 
(|9} is also Gaussian with correlations 



1 



nt 



For the case of narrowband channels mentioned above in 
\\ It . Gfc is nonzero only for fc = 0, therefore Go = G 
with G given in ( I12> . 

III. Wideband Mutual Information 

The mutual information of each of the frequency compo- 
nents fc is given by [1], [2] 



logdet (l„^ + GfcQfcGfe'f) 



(17) 



The log above (and throughout the whole paper) represents 
the natural logarithm and thus / is expressed in nats. The total 
mutual information over all frequency components is then 



Af-l 

E 

fc=0 



(18) 



A. Statistics of Mutual Information 

The distribution of the mutual information can be charac- 
terized through its moments. These moments can be evaluated 
by first calculating the moment generating function g{v) of / 

g{v) = ( n det (l„„ + GfcQfcGfct) ) (19) 



fc=0 
-vl\ 



(20) 



Assuming that g{iy) is analytic at least in the vicinity of = 0, 
we can express \ogg{i') as follows 



\ogg{v) = J2 



pi 



(21) 



where Cp is the p-th cumulant moment of /. For example, the 
ergodic mutual information, i.e. the average of the distribution 
is given by 

= (^) = E (22) 



fc=0 



M-l 



= ^ (logdet (l„„ +GfcQfcGfct 



fc=0 



Similarly, the variance of the distribution is 

C2 = Var{I) = {{I-{I)f) 



Af-l 



,k,k'=0 



(23) 

(24) 
(25) 

(26) 



its the skewness of the distribution is 

C,^SkiI) = {iI-{I)f) 

and so forth. Note that since Ik depends only on Gfc and 
Qfc, to evaluate the ergodic average we can perform the 
average for each term in the sum in i23\ separately, neglecting 
any correlations between G^'s with different k indices. Thus 
for evaluating the ergodic average, the only correlation of rele- 
vance is {Gk,iaGk.jp) which turns out to be fc-independent, as 
seen in (I16> . Therefore the only fc-dependence of each term 
in the sum in \23l is through Q^,. As a result the optimal 
Clk will be fc-independent. We will thus henceforth assume 
that Qfc is chosen to be a fc-independent quantity Q. As a 
result, the wideband ergodic capacity becomes just M times 
its narrowband counterpart [29]. This fc-independence of the 
mean mutual information will be of use in the next section. 
In contrast, in evaluating higher moments of the distribution 
such as the variance, as is easily seen from M5\ . we will have 
to consider cross correlations between Gfc and Gfc' 

Finally,it should be emphasized that the distribution of the 
mutual information can also be completely characterized by 
the outage mutual information [15], obtained by inverting the 
expression below with respect to lout 



Pout Prob (/ < lout) 



(27) 



where Prob{I < lout) is the probability that the mutual 
information is less than a given value lout- 
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IV. Mathematical Framework 

The purpose of this paper is to analyze the statistics of 
the wideband mutual information / in ( I18> for general zero- 
mean Gaussian channels. In this section we describe the 
basic steps to derive analytic expressions for the first few 
cumulant moments of /, valid formally for large antenna 
numbers. In this limit it has been shown elsewhere [14], [17], 
[18], [41] that the narrowband mutual information distribution 
becomes asymptotically Gaussian. Thus the first two moments 
can describe the outage mutual information ( I27t . Using the 
mathematical framework of [8], [14] we will show that this 
Gaussian character holds also for wideband channels. 

To obtain the moments of the mutual information distribu- 
tion we need to calculate g{i') in il9i for ly in the vicinity of 
i/ = 0. To achieve this we will employ the replica assumption 
discussed in [4], [8], [14], [42]. 

Assumption 1 (Replica Method): g{v) evaluated for posi- 
tive integer values of v can be analytically continued for real 
V, specifically in the vicinity of v — Q^ . 

This assumption, used also in [7], [36], [37], [43], [44], 
alleviates the problem of dealing with averages of logarithms 
of random quantities, since the logarithm is obtained after 
calculating g{v). 

In Appendix HII we show that g{h') can be expressed as an 
integral over Mv x Mv complex matrices 7?.', T', with I = 
0,...,L-1 

g{v)= J d/i({r^7^'})e-•5 (28) 
where the integration metric was defined in (|2} and 

S = log det I (g) I^M + V ® ) 

V ; ^^ / 

+ log det ( ® I.M + y QT/ ® r' I 

- yTr{T'7^'} (29) 



where T is an Mi/ x Mv matrix related to T via 



(30) 



where we have explicitly written out the components of the 
matrices here with k, k' ranging from to M — 1 and a and 
(3 ranging from 1 to v. (See the notation in Appendix Hill. 

At this point v is still a positive integer, which has to be 
taken to zero following Assumption ^ in order to be able to 
expand g{y) for small v, as in MQ\ . However, since the integral 
in j28t cannot be performed exactly, we need to calculate it 
asymptotically in the limit of large antenna numbers nt,nr ^ 
1. Therefore we need to interchange the limits n 3> 1 and 
V ^ 0+. 

Assumption 2 (Interchanging Limits): [14] The limits n — > 
Qo and — > 0+ in evaluating g{v) in ( I28> can be interchanged 
by first taking the former and then the latter without affecting 
the final answer. 



A. Saddle-Point Analysis 

We now use Assumption |2l to calculate (I28> asymptoti- 
cally for large rij, rir, by deforming the integrals in (I28> 
to pass through a saddle point. More details are given in 
[14]. To specify the structure of the saddle-point solution, 
i.e. the form of T', 7i} at the saddle-point, we assume as 
in [14] that the relevant saddle-point solution is invariant in 
^^-dimensional replica space. However, in our case since T', 
Ti} are i^AZ-dimensional matrices, this is not enough to fully 
characterize the saddle-point. Therefore, we will also assume 
that the saddle-point values of T', Ti} are invariant in M- 
dimensional frequency g-space. This assumption, as we shall 
see, leads to a saddle-point value of S, and to an ergodic 
average of the mutual information, that is independent of inter- 
frequency correlations, in agreement with the correct answer, 
as discussed in Section Illl-Al and [29]. 

Thus, at the saddle-point T', Ti} take the form T' = 
tiy/Thl^M, 7?.' = riy/rnl^M, where ti and r; are positive, 
still undetermined numbers of order unity in the number of 
antennas. A scaling factor of y/nl has been included for 
convenience, as will become evident below. Following [14] we 
analyze the integral in ( I28> by shifting the origin of integration 
to the saddle point, i.e. by rewriting T, TZ as 



(31) 



where STZ^ are i^Af -dimensional matrices representing 

deviations around the saddle point. One can then expand S in 
M9\ in a Taylor series of increasing powers of 5T^, STZ^ as 
follows 

S = So + Si+S2+S3... (32) 

with Sp containing p-th order terms in ST'', STZK These terms 
are shown explicitly in Appendix IIIII in i90l . ( 19 11 1. ( I92l i. ( I95l l. 
where it can be seen that Sp is 0{-n}-P/'^), making OSJl indeed 
an asymptotic expansion in inverse powers of n. 

The saddle point solution of ( I28> and hence the correspond- 
ing values of ti, ri is found by demanding that S is stationary 
with respect to variations in T', 7?.'. [45] This means that 
5i = (see (I91». which is analogous to setting the first 
derivative of a function to zero, in order to find its maximum or 
minimum. This produces the following saddle-point equations: 



EL- 

nt 
1 



n = — Tr<^QTj 

nt 



QT 



tl = — Tr<iR, 

nt 



where T, R have been defined as 



R 



R 



E 
E 



n R/ 



(33) 
(34) 

(35) 
(36) 



The next term in the expansion of S is ^2 and needs to 
be taken into account non-perturbatively, because it is 0(1) 
in the number of antennas n and thus will provide a finite 
correction. Fortunately, ^2 is quadratic in the variables 5T' 
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and 5Ti} so that the integral ( I28> is just a Gaussian integral at 
this order 

In contrast, Sp terms with p > 2 become vanishingly small 
at large n, since they are ©(n^^P/^). Therefore, they can be 
expanded from the exponent in ( I28> and treated perturbatively 
as follows: 



-So 



d^l{{8T'},{5n'})~ 



-52 



(37) 



53 - 54 + -5; 



Each term in this expansion can be evaluated explicitly, 
with higher order terms producing corrections of increasingly 
higher orders in 1/n. Subsequently, taking the logarithm of 
the result as prescribed in (12 U will produce an 1/ n-expansion 
for the cumulant moments of /, with only integer powers of 
\/n surviving [14]). 

B. Ergodic Capacity 

From S in Appendix Uni we see that Sq = uT with 
proportionality factor F being the leading term to the mutual 
information, which is given by 



F 



Mlogdet 
M log dot 



Qt) 



(38) 



R 



nti 



where U, n, T, R are given by (|34|l, OSjl, ( l36l . 

Note that the above equations are independent of the rel- 
ative delays between paths, thereby applying to narrowband 
channels, as well as wideband channels with non-trivial delays 
between paths. This is to be expected since the ergodic 
wideband capacity is independent of delay. [29] 

To obtain the capacity-achieving input distribution Q, (/) 
has to be optimized subject to the power constraint Tr{Q} = 
nt- This constraint is enforced by adding a Lagrange multiplier 
to (/), i.e. 

(/> ^ (/> -A(Tr{Q}-nO (39) 



qi - nt 



where qi are the nt eigenvalues of Q. As in [14], the 
eigenvectors of the optimal Q are the same as T (at least 
to 0{l/n)). This statement is proven in Appendix fVl 

With the constraint that Q and T should be diagonal in the 
same basis, we can find the optimal Q by differentiating with 
respect to the eigenvalues qi. It is then easy to see [14] that 
the optimal eigenvalues of Q are given by 

"1 1 

A Ti 



(40) 



where Ti are the nt eigenvalues of T and \x\_^ = {x + 
sgn(a;)}/2. Here, the Lagrange multiplier A > is determined 
by imposing the power constraint 



Tr{Q} 
with qi given by J40> . 



nt 

E* 

i=l 



nt 



(41) 



C. Variance of the Mutual Information 

To obtain the 0{v^) term in the expansion of \ogg{v) in 
(12 1> we need to only include the next non-vanishing term, ^2. 
The second line in ( I37> can be temporarily neglected. 

Using the saddle point value for 5o = in Ea.( l37> . the 
integration over 6Tl}, 5T^ can be performed straightforwardly 
(see [14] for more details), resulting in 



W |detV 

k,k' 



kk' 



(42) 



where the 2L-dimensional matrix V*^*^ is given in Appendix 
lllllbv i94i . Thus, by comparing ( I2H to i42t and by matching 
order by order the terms of the i/-Taylor expansion of log 
the leading term in the variance of the mutual information is 



C2 = Var{I) = - ^ log I det V'^'^'' 
fefc' 

; log det (Il-M, 



0{l/n^ 



(43) 



kk' 



-Oil/n') 



where the L-dimensional matrices 'M.t,2, Mr.2 are given in 
Appendix IIIII bv ( I96l l and ( I97l l. We note that since Mr, 2 and 
Mt,2 are both 0(1), the variance is also formally 0(1) in the 
1/n expansion when both nt and Ur are of the same order. 

D. Higher Order Terms 

To obtain higher-order corrections in n^^, beyond the 0(n) 
and 0(1) terms that appear in the average and the variance, 
respectively, one needs to take into account the terms Sp with 
p > 2 in i37\ . These terms will give rise to higher-order cu- 
mulant moments of the distribution of the mutual information, 
as well as higher-order corrections to the first two cumulant 
moments. In Appendix IIVI we sketch the calculation of the 
next leading correction terms of order 0(n^^). Including this 
additional term g{i^) can be written as 



.9(^) 



Jl dctV'='= [1 



k,k' 



Di + Oin-^)] (44) 



where Di is given by 



Di=aii' + a3U^ (45) 

and ai and 03 are defined in ( I1Q6> . ( I107t . which are indeed 

0(l/n). 

Using the cumulant expansion notation of ( 12 U and matching 
the generated terms above to the appropriate powers of i^, we 
see that Di produces order 0(l/rt) terms to the first cumulant 
(mean) Ci and third cumulant (skewness) C3: 



Ci = T -ai+0{l/n^) 
C3 = -6a3 + 0(l/n3) 



(46) 
(47) 



E. Special Case 1: R; independent of I 

In this section, we will show how the above results simplify 
when the correlation matrix of the receiver or transmitter is 
independent of the path index I. For concreteness we will only 
analyze the case where R; is independent of I, i.e. when the 
channel correlations take the form M5\ . 
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In this case we see that in ti is independent of the path 
index I and thus we may set t ~ti. Furthermore, by summing 
( I33> over / we get 



E 



ri 



t 



-T"- ] T 



QT ] 

+ iQT J 



1 r R 

— Tr<^ 

nt I In, + 



rR 



(48) 
(49) 



where T — J2i Pi'^i ^^'^ R = R;. Thus the mutual informa- 
tion in ( I38> may be written as 

(/) = Mlogdet(I„, + iQT) (50) 
+ M log det (In, + rR) - M mrt 

Note that, apart from a redefinition of T to take into account 
multiple paths, these results are identical to those derived 
previously for narrowband channels [14]. 

To derive a simplified expression for the variance from (|43}, 
we note that Mr, 2 now becomes a constant matrix, which can 
be written as a vector outer product 




mr.2vv 



(51) 



where the vector v has elements vi — 1 for alH = 1, . . . , L. 
The second equality in the above equation defines TOr,2- 
Similarly, Mt.2 can be written as: 



Mi' 



Pi Pi ' 

nt 
Tr 



■ exp 



27ri(fci - k2){mi - mi>) 



M 



(52) 



<QT 



QT;- 



tQT 



After some algebra we see that ( I43> simplifies to 



Var{I) = -^logll- 



kk' 



where 



.1 - 



= — Tr 

nt 



tQT 



k — k' 
mr,2 '^^t 2 



QS 



<QT 



with the matrix Sq defined as 



f 2T:iqnii 



M 



(53) 



(54) 



(55) 



which is the temporal Fourier transform of the correlation 
matrices T;. 



F. Special Case 2: Narrowband Multipath 

As mentioned in the introduction, this approach is appli- 
cable in calculating the ergodic average and variance of an 
arbitrary Gaussian zero-mean channel. This obviously includes 
a narrowband channel with arbitrary correlations. The only 
difference in the analysis of this channel is that all delay 
indices to; are equal and can thus be set to zero. 



V. Analysis of Results 

In the previous section we have seen that in the limit 
of large antenna numbers n, the mean mutual information 
is of order n, while the variance of the distribution is of 
order unity. In addition, in Appendix IIVI we find that the 
skewness (the third cumulant moment) is 0{l/n) and higher 
cumulant moments are even smaller (©(l/n^)). In agreement 
with the narrowband case [14], [17], this suggests that the 
distribution of the wideband multipath mutual information is 
also Gaussian for large n. This Gaussian behavior was seen to 
be very accurate even for small antenna arrays for narrowband 
channels [14], [18]. Below, we will see this to hold also in 
wideband multipath channels by numerically comparing the 
Gaussian distribution M [{I) ,Var{I)] calculated using ( I90I I 
and ( I43> with the simulated distribution resulting from the 
generation of a large number of random matrix realizations. 
We will specifically analyze four representative situations to 
show the effects of multipath on the distribution of the mutual 
information of wideband channels. 

If the distribution of the mutual information is Gaussian, 
we can express lout from ( I27> as 



hut = (/) - ^2Var{I)<^-\2Pout - 1) 



(56) 



where $^^(a;) is the inverse error function. [46] Clearly, this 
can only be an approximation, since the mutual information 
cannot take negative values. 

A. Distribution of Wideband Mutual Information for L equal- 
power equally-spaced i.i.d. paths 

It is instructive to apply the above results to the case of L 
equal power paths, with pi = p/L in il6\ . with nt = rir = n 
and with correlation matrices being unity, i.e. R; = T; = I„. 
Also, for simplicity we assume the delays of the paths are 
all equally spaced from each other by r, i.e. m; = I. This is 
a special case of the one discussed in Section IIV-EI In this 
case the optimal input distribution is Q = I„ [14], and (15 0> 
becomes 



(/) = riM [log (1 + pt) + log (1 + r) - tr] 



(57) 



with the extremizing values of r and t from (I48> . (I49> given 
by 



pt 



(58) 



which gives 



(/) = nM 



2 log 



Ap 



(59) 

This result is identical to the one derived elsewhere [14], [47]. 
The variance can be calculated using ( I53> , \5A\ with the 
in \55\ taking the form Sq = pT-n/L and takes the form 



Var{I) 



k,k'^0 



' . ■ nL(k—k')q 

tp sm \j 
tp+lLsin^^^^^ 



(60) 
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with t given by j58> . We see that the larger L is, the more 
peaked the ratio inside the logarithm is, and therefore the 
smaller the variance. If L — M, the ratio of sines in ( I43> 
becomes proportional to a Kronecker delta function Sk^i, so 
that the variance becomes equal to 



Var{I) = -Mlog 



1 - 



(61) 



In general we can say that the variance of the normal- 
ized mutual information per channel (i.e. I/M) scales as 
Var{I/M) - l/L. 













All 









which can be performed analytically to give 

1 + e^25 _ ^ (-^ _ g- 



where 



1 



tp 



/3(1 



l + tp 



16p2 



(68) 



4e 



-2(5 



(l + VT 

( I68> is plotted in Fig. Q as a function of the delay. 



(69) 



B. Distribution of Wideband Mutual Information for an expo- 
nentially distributed power delay profile 

We can also apply this approach to a more realistic version 
of a multipath channel, namely one with an exponential power 
delay profile, which can be expressed as 



pi= pil-e 



-<5^ 



(62) 



where 5^^ — d/T is the product of the delay constant d with 
the bandwidth r~^, and p is the signal-to-noise ratio for the 
total power-delay profile. We have implicitly assumed here 
that the number of paths is infinite, L — oo. For the simple 
case of uncorrelated channels, where both T' and R' are unit 
matrices, the average mutual information is identical to ( I57> . 
by replacing p with p. This can easily be seen by observing 
that the average mutual information in i50\ is a function of pi 
only through T, which here is equal to 



(63) 



k-k' 



To calculate the variance of /, we first need to calculate 2 
and mr,2 in ( 15 3> . The former can be evaluated from (I54> by 
performing the sum \55\ 



E 

(=0 



Int E 

1=0 



(64) 



1 - e 



-5+- 



As a result ml 2 (™d similarly 771^,2) can be expressed as 



[i + tpY 
1 



1 



2^ _ Q-S+2-Kiq/M 



(1 



(65) 
(66) 



so that the normalized variance per channel can be expressed 
as 



Var 



' k,k' 



k — k' 
^r,2Wj 2 



(67) 



When the number of frequency channels M is large, we can 
approximate the above sums with integrals over frequency. 



C. Interdependence of spatial and temporal correlations 

In the previous section, we analyzed the situation where all 
paths had the same transmission correlation matrices T; = 1,^ 
resulting to significant simplifications. This situation is not 
necessarily realistic. Typically, each path has an angle spread 
smaller than the composite angle-spread and with a different 
mean angle of departure from the transmitter for each path. 
[27] Thus, even if the composite narrowband correlations at the 
transmitter are assumed to be low, the associated correlations 
per path may be substantial. It is therefore interesting to 
compare the mutual information distribution of the following 
two situations: In the first, all paths have a correlation matrix 
identical to the narrowband composite correlation matrix. In 
the second, each path has different correlation matrices, subject 
to giving the same narrowband correlation matrix as in the first 
case. For simplicity we will take the narrowband composite 
correlation matrix to be unity, with the following correlations 
between transmitting antennas: 



^2ni{a-b)dx sin((0+0o)'r/18O)-0^/(2(5^ 



(70) 

with a,b = 1 ... fit being the index of transmitting antennas. 
This is a simple model for the antenna correlations of a 
uniform linear ideal antenna array with d\ = dmin/^ the 
nearest neighbor antenna spacing in wavelengths, a Gaussian 
power azimuth spectrum with angle-spread 6 degrees and (po 
degrees mean angle of departure. [48], [49] 

In Fig. El we see that, although the mean mutual information 
is identical in the cases, the variance of the mutual information 
of the second case is roughly double to that of the first case. 
We thus see that the correlation structure of the underlying 
paths have a significant effect on the mutual information 
distribution. 



D. Example: L distinct fully correlated paths 

As a final example, we describe a simple version of the gen- 
eral non-Kronecker channel case given by dlOt . In particular, 
we assume that rit = n.r = n and that the correlation matrices 
T;, R; are mutually orthogonal, rank-one matrices, e.g., for 
the transmitter we have T; = na/a|, with a|a;/ = Sw. This 
corresponds to a set of i < n orthogonal plane-waves at the 
transmitter, each of which are connected with a plane-wave 
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Standard Deviation of the Distribution of the IVIutual Information/Hz for iid channels 
and exponentially distributed power delay profile 

1.4 I ^ ^ ^ 



Distribution of Ivlutual Information for nt=nr=2 and nt=nr=3; SNR=1 
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Fig. 1. Standard deviation of tlie mutual information as a function of the 
normalized delay spread (d/r) for the case of an exponential power delay 
profile for three different signal-to-noise ratios. For zero delay (d = 0), the 
narrowband result is recovered (j/-axis). For increasing delays compared to 
bandwidth d > t, the standard deviation of the mutual information decreases. 
Eq. 168t has been used. 



arriving at the receiver in orthogonal directions. In this case, 
i33\ and i34\ simplify to 



n 



ti 



piqi 



1 + nqipiti 
1 



(71) 
(72) 



1 + nri 

where qi are the L eigenvalues of Q, given by ( I4Q> . Assuming 
for simplicity that the pi are ordered, i.e. pi > P2 > • • • PL, 
the final solution for the capacity-achieving input distribution 
covariance matrix is 

- E 

n ^-^ 

1=1 

1 



Q 



1 



where 



A, 



(73) 
(74) 



4m — a,- 



(75) 



1=1 



Here, ie// is the number of non-zero Q eigenvalues, chosen 
with the condition 



A, 



< npi 



(76) 



for all / < Leff, which comes from the requirement ri > 0. 
The resulting ergodic capacity is 



/ = 



E 

1=1 L 



log 



npi 



1 



'A, 



npi 



(77) 



From ( I35> and ( I73> , we see that the capacity-achieving co- 
variance matrix is a non-trivial linear combination of T;, each 



Fig. 2. Cumulative distributions (CDF) of the mutual information with two 
and three antenna arrays for signal-to-noise ratio (SNR) p = 1. 10 paths 
were used, each with an angle-spread of 18 degrees, with the mean angle 
of arrival of the l-th path pointing at 18(/ -f 1/2) degrees. While the mean 
mutual information is nearly the same for both correlated and iid cases (1.74 
nats for rit = 3 and 1.16 nats for rit = 2), the variance of the correlated 
systems is nearly double the variance of the corresponding iid case (0.357 
vs. 0.0171 for m = 3 and 0.0274 vs. 0.0171 for the m = 2 case). The 
agreement between the analytic large N expression and the simulation is very 
good down to 1% outage. 



with coefficient tipi, which is obtained by solving J71t . (I72l i 
and i74\ . which depends on the properties of all paths. 

VI. Conclusion 

In conclusion, we have presented an analytic approach to 
calculate the statistics of the mutual information of MIMO 
systems for the most general zero-mean Gaussian wideband 
channels. We have also shown how the ergodic capacity can 
be calculated by optimizing over the Gaussian input signal 
distribution. The analytic approach is in principle valid for 
large antenna numbers, in which limit the mutual informa- 
tion distribution approaches a Gaussian, irrespective of the 
wideband richness of the channel. Thus the outage capacity 
can be explicitly calculated. Nevertheless all results have been 
found numerically to be valid with high accuracy to arrays 
with few antennas. Thus our results are applicable to a wide 
range of multipath problems, including, but not limited to, 
multipath channels with a few delay taps or to an arbitrary 
continuous power-delay profile and dual-polarized antennas 
with arbitrary correlations. It should also be noted that this 
method generalizes the so-called IND separable channels 
analyzed in [10] to general non-separable IND channels with 
arbitrary non-Kronecker product correlations. 

This analytic approach provides the framework and a simple 
tool to accurately analyze the statistics of throughput of even 
small arrays in the presence of arbitrary channel correlations. 

Appendix I 
Complex Integrals 

Identity 1: Let X, A, B be respectively m x n complex 
matrices and N. M positive-definite hermitian nxn and m x 
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m. Then, the following equality holds 

(det [N ® M])-^ g-iTr{N-AtM-B} ^^g^ 

jjj^ g-iTr{NXtMX+AtX-XtB} 



where the integration measure _DX is given by Q. 

Proof: See Appendix I in [14]. Note that this formula 
was printed incorrectly in that reference. Here we state the 
relevant identity. 

■ 

There are several useful special cases of this identity. Setting, 
A = and B = 0, we obtain 

(det[N(^M])"^ = (dctN)-™(detM)~" (79) 

DX e-5Tr{NxtMx} 



Further setting N = I„ yields 

(det M)"" = / g-iTr{xtMx} ^g^^ 

Identity 2 (Hubbard-Stratonovich Transformation): Let U, 
V be arbitrary complex Mv x Mv matrices, where v is 
assumed to be an arbitrary positive integer Then the following 
identity holds 



(81) 



In the above equation, the auxiliary matrices T and TZ are 
general complex matrices Mv x Mv and their integration 
measure is given by (|2jl. The integration of the elements of 
TZ and T is along contours in complex space parallel to the 
real and imaginary axis respectively as discussed in [14]. 

Proof: See Appendix I in [14]. ■ 

Appendix II 
Derivation of (l28t . (l29t 

In this Appendix we will express g{v) as in (I28> . ( I29> . We 
start with M9\ assuming that v is an arbitrary positive integer. 
Using ( I80> we can write 



(82) 



det(I„,+GfcQG, 



j L»Xfee-5Tr{xtx,+xtG,QG,txJ 



where is an rit x iy-dimensional complex matrix. We then 
further use (I78> to write 



e-5Tr{xtG,QG,txJ (g3^ 

where is also an nt x i/-dimensional complex matrix. Thus, 
using i&2\ and (I83> and the definition M9\ of g{v) we can write 



W j i?Xfci^Yfce-5i:. Tr{xtx,+YtYj 



.-ii:.Tr{xtG,QV2Y,-YlQV2G,txJ 



(84) 



where k ranges from to A/— 1. Note that, as discussed above, 
we have been able to set all equal to a single Q. 



To average the bracketed term over channel realizations we 
use (|9} to express G in terms of G;. The probability density 
of G; is defined by ([lO} and can be rewritten explicitly 



p{Gi) = det 



Pi rr 

nt 



R/ 



e ^"t 



^^Tr{Tr^GtR-iGj 



(85) 



The expectation bracket of any operator F({G;}) which is a 
function of the Gj's can then be written as 



L-l 

Fi{Gi})) = n / ^G^KGO F({G,}) (86) 

1 n 



(=0 



Note that using il9i it is easy to see that this probability 
distribution is properly normalized (i.e., (1) = 1). 

We now evaluate the expectation bracket in (I84> by rewriting 
G in terms of G; and integrating over the channel realizations 
(using ( I85> and ( 1861 and applying (I78> to perform the integral). 
As a result we obtain 

g{v) = n / ^^'^ E/c Tr{xtx,+Yt Y J ^g^-, 



^i:..>e '""°M'°''"'' Tr{xl,R,X,YtQV^T,Q^/^Y^,} 



Following [14] we use Identity|2lin AppendixUlto express the 
above in a quadratic form in terms of Xfc, Yfc by introducing 
2L Mv X Mv matrices 7i}, T'. These matrices, whenever 
convenient, will be represented 7?.^^,,, Tji^.,, as a set of LM"^ 
matrices of dimension v x v each. Thus the second line of 
(I87> becomes 



d^i{{T\n'})\{\{{e^V [Tr{r,',,7^L'}] (88) 
Tr{r,',fcY^Qi/2T,Qi/2Yfc,} 



I kk' 



■ exp 
• exp 



Pi 2,ri(fc-fc')n 

-e M 



Combining (I87> and (I88> and using (I79> . we can now integrate 
over Xfe, Yfc, resulting in 



g{v)^ dM({r^7^'})e 



(89) 



with S given in M9\ . 



Appendix III 
Details for Saddle Point Analysis of (l28t . (l29t 

Using the change of variables T' ST\ Ti} 8Ti} 
defined in (13 1> we expand S in M9\ in powers of JT', 8Tl}, 
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resulting in 

Sq = i 



Si 



Mlogdet ^I„, + ^i,p(QT,^ 
Mlogdet ^I„^ + J2 ^'I^'^ - J2 



(90) 



(91) 



k,l 



kk' W 

\Y.'^r{^kk-^'''^l,} 



(92) 



where the 2L-dimensional vector x^fc' of v y. v matrices is 
defined as 



(93) 



and the corresponding 2L-dimensional Hessian V*^*^ is ex- 
pressed in block-diagonal form as 



~^kk' 



-II 



-II 

'Mt.2 



(94) 



where the matrices Mt.,2, Mf ,2 in the diagonals have elements 
and respectively, with Z = 0, . . . , L - 1. For p > 2 
the expanded terms take the form 



Sn — 



i-ir 



-L ^ [mI^^ Tr {STZil^,^ ■ ■ ■ 6TZ[^^,^ } (95) 

kp,lp 

where the p-dimensional integer valued vectors 1 = [li . . .Ip], 
this Taylor expansion have the form 



p — [fci . . . kp] are being summed over. The coefficients in 



n 



(96) 



r,p 



n 



(97) 



Note that while in Q appears in the form Q^/^TjQ^/^, 
in i90\ . i96\ it is possible to combine the two Q^/^ into a 
single Q. 

Appendix IV 
Higher Order Terms 

In this section we will follow the formulation of [14] 
to calculate the leading 1/n correction to g{i') which will 



contribute as a leading term to the skewness C3 and as a 
correction to the average mutual information Ci. 

We define an expectation bracket of F{6T, 6TZ), an arbi- 
trary function of ST, dTZ, as 



{{F)) = Yl |detV'^*' " f dfi{dT, dn)e-^'F{dT, dTZ) 

kk' •' 

(98) 

To calculate such expectations, we will expand the function 
F in its arguments and will then integrate over the Gaussian 
integral. Thus only integrals over even powers of 6T, dTZ will 
survive. To evaluate the expanded terms we need the following 
second order moments (see below) 



ii^^kik2,ab^^k3ki,cd)) ^ 
ii^'^kik2,ab^'^k3ki,cd)) = 



-Skiki Sk2k3SadSbcW^^pg^ 
-Skxki^k^ksSad^hcW^^p^ 
-5k^kJk2kJadhcW^^p^ 



(99) 



where for each ki,k2 — 1, . . . ,1^, the L x L matrices W, ^ 
for z = 1, . . . , 3 are given in terms of the LxL matrices Mr,2, 
Mt 2 (see (I96> . (I97» by the following expressions 



^^k^k2 



-Mt,2 [M^,2Mt,2 - 
-M^,2 [Mt,2M^,2 
[M,,2Mt.2-lL]"' 



(100) 



independent of fci,/c2- In our particular case, the function 
F{6T,6TZ) is exp[— ^^^2 '^p], with Sp expressed in terms 
of ST, STZ, as in (|95}. We now expand the exponential by 
combining terms with equal powers of n. To do this, we note 
in ( I95> that {{Sp)) is of order for p even, while it is 

zero for p odd. Keeping only the 0{n^^) terms, §{1^) takes 
the form 



(101) 



pq 



where 



Di = {{S, + -Si)) 



(102) 



which is of order 1 /n. 

To evaluate Di we need to calculate {{S4)) and 
which, as seen in ( I95> . include fourth order and sixth order 
products in ST, STZ, respectively. These can be calculated by 
applying Wick's theorem (see [14] or [50]), i.e. by "pairing" all 
ST's and STZ's with each other and using ( I99l l to calculate the 
corresponding quadratic moments. As an example, we evaluate 
below the term in ^4 which is proportional to AI^]^^^^^'^ in 
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ki. 



M 

E 

M 

E 



E 

-1 a,b,c,d- 

E 

-1 a.b.c.d- 



mil 



k2 ,ab ' ' 



ia)) (103) 



of ( I38> given also below 
r(Q,{t;},{ri}) = Aflogdet(l„, 
+ Mlogdet(l„,, 



QTJ 



/C4/C1 ,rfa)) 

\\//X-DP2 ) 

mi\u.,abSRi:k,.j)mZk,M^Kk.,J) 

1,P3P4 1,P1P4 1,P2P3 y 



with respect to ti, ri for Z = 0, . . . , L — 1. The saddle-point 
equations are given by (I33> . (I34> also seen below: 



kiki 



n 



ti 



pi_. 

n-t 
1 
nt 



Tr^QTi 



QT 



-Tr<^R 



H 



R 



' " l,PiP3 i,P2Pi y^^^j 
We can similarly evaluate the second term in ^4 as well as 



Si to get 



where 



ai 



and 



as 



Di = aiv + a^v 



(105) 



E 

P1---P4 



AT 



4 "l,pij>3"l 



kk 

,P2P4 



ML 



lP2P3P4,yykk 



kk 



E 



2,PlP3 2,p2P4 J 
P2P3 j^^P4P5P6l 



(3 L r,3 r,3 l,PiP2 1,P3P4 1 



t,3 



fefc 



fefc 



fcfc 

2,pip2 ""^ 2,P3P4 '''' 2,p5P6 



'4P5P6 



t,3 



kk 



kk 



kk 

3,PlP4 3,P2P5 3,P3P6 



J} 



It should be noted that the mutual information is an extremum 
of 5 in a larger complex space of the elements of the matrices 
{T', 7?.'}, but for simplicity we only focus on the dependence 
of r(Q, {ti}, {r/}) in the 2i-dimensional space of {ti, r;}. In 
this case one can view r(Q, {t/}, {r/}) as a function of 2L + 
nt^+rit — l variables, where the last rit'^+nt — l are the degrees 
of freedom of Q, an rit-dimensional hermitian complex matrix 
with fixed trace. Extremizing the above function over 
(106)^^ can eliminate all {ti,ri} using the above equations. Thus 
for fixed Q the mutual information can be written as /(Q) = 
r(Q,{ti(Q)},{n(Q)}) with ti{Q),ri{Q) functions of Q. 
Suppose now that we maximize /(Q) with respect to Q with 
the constraint TrQ = rit and that Qo is the optimal matrix. As 
a result, /(Qo) = r(Qo, {t;(Qo)}, {r-;(Qo)}) is a maximum 
over Q and an extremum over , r; . Thus if one varies Qo 
locally keeping its eigenvalues (and trace) fixed, the variation 
of /(Q) will vanish to first order in the variation. The most 
general such variation can be written as 



kk 

.P5P6 



E { 



Pl...p4 fclfc2fc 



(107) 



Qoe 



= Qo + iA[H,Qo] 



(108) 
(109) 



^Pl^P2P3P4 
^^P1P2P3P4 



\^rklk2 T^klk: 

1,P1P2 1,P 



,PlP2 

rkik2 



P3P4 

k\k3 



T^klk 
l,Pl 



Pi.-.pe feife2fc3 
rkik-i 



2,pip2 2,P3P4 



PlPA 



^k2k 



P2P3 

k^kz 



2,pip4 2,p2P3 



where H = is an arbitrary traceless Hermitian matrix, 
A is a small scalar, and the notation [a, 6] = ah — ha is the 
commutator. Thus the first derivative of F with A has to vanish 
at A = 0. Therefore we have 
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Since F is an extremum with respect to {ri,ti} the partial 
derivatives of F with {ri,ti} vanish. We are left with the first 
term, dT/dX, which should also vanish if F is a maximum 
over Q, resulting to 
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Appendix V 

Capacity- Achieving Input Signal Covariance Q 

In this Appendix we will show that the capacity-achieving 
input distribution Q is diagonal in the basis of T defined 
in \35\ . To start the proof we point out that the mutual 



with 



Z = I„, +QoT - I„, +TQ, 



(113) 



Now, since H is an arbitrary traceless Hermitian matrix, 
the condition di 12> is equivalent to the statement that Z is 



information (to order 0{\/n)) for a given Q is the extremum proportional to the identity matrix. However, it is easy to see 
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from jl 13> that Z must be traceless, which implies that our 
extremization condition is equivalent to Z = or 

tQo = QoT (114) 

which requires that Qo and T have the same eigenvectors 
whenever Qo is a maximum. 
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